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^_^ Abstract 

\J-4 The computational complexity analysis of genetic programming (GP) 

\^^ has been started recently in [7] by analyzing simple (1+1) GP algorithms 

for the problems ORDER and MAJORITY. In this paper, we study how 
fj taking the complexity as an additional criteria influences the runtime be- 

1 I havior. We consider generalizations of ORDER and MAJORITY and 

present a computational complexity analysis of (1+1) GP using multi- 
\ criteria fitness functions that take into account the original objective and 

'^ the complexity of a syntax tree as a secondary measure. Furthermore, we 

-^ study the expected time until population-based multi-objective genetic 

/-y-j programming algorithms have computed the Pareto front when taking 

^^ the complexity of a syntax tree as an equally important objective. 

cn 

^ 1 Introduction 

. • Genetic programming (GP) [TS] is an evolutionary computation approach that 

^ J^ evolves computer programs for a given task. This type of algorithm has been 

^^ shown to be very successful in various fields such as symbolic regression, financial 

^ trading, and bioinformatics. We refer the interested reader to Poli et al. [37] 

^ for a detailed presentation of GP. Various approaches such as schema theory, 

markov chain analysis, and approaches to measure problem difficulty have been 
used to tackle GP from a theoretical point of view |2H]- Poll et al. [15] state 
explicitly that they expect to see computational complexity results of genetic 
programming in the near future. 

With this paper we start the computational complexity analysis of multi- 
objective genetic programming. This type of analysis has significantly increased 
the theoretical understanding of other types of evolutionary algorithms (see 
the books Pl US] for a comprehensive presentation) . For various combinatorial 
optimization problems such as minimum spanning trees |25j . minimum multi- 



cuts [231 [53], and covering problems [5], it has been shown that multi-objective 
models provably lead to more efficient evolutionary algorithms. 

Initial steps in the computational complexity analysis of genetic program- 
ming have been made by Durrett et al. [7|. They have studied simple mutation- 
based genetic programming algorithms on the problems ORDER and MAJOR- 
ITY introduced in [T3] . Furthermore, the computational complexity of GP has 
been studied in the PAC learning framework [TS] defined by Vailiant [3D] and 
for the Max Problem [17] introduced by Gathercole and Ross [TO] . 

Classical genetic programming often suffers from the occurrence of bloat [19[ 
13] , i.e. the growth of parts in the syntax tree that does not have any contribution 
to the functionality of the program. Due to this, different mechanisms for 
handhng bloat are often incorporated in GP algorithms (see e.g. [2T1 [5H1 [T] ) . A 
simple approach to deal with bloat in GP is to favor solutions of lower complexity 
if two solutions have the same function value with respect to the given objective 
function. This leads to a multi-criteria fitness function which is composed of the 
original function to be optimized and a function assigning a complexity value 
to a given solution. Still there is a total ordering on the set of possible solutions 
and a solution would be considered as optimal if it has the smallest complexity 
among all solutions that achieve the highest function value with respect to the 
original goal function. 

Another way of dealing with the bloat problem is to use a multi-objective 
approach where the original function and the complexity are equally important. 
This induces a partial order on the set of possible solutions as usually the original 
function and the complexity trade-off against each other. An advantage of this 
approach is that solutions of different complexity are generated which gives 
practitioners insights on how quality trades off against complexity. Such an 
approach is taken in one of the most popular genetic programming tools called 
DataModeler [S] which allows the user to compute the trade-offs with respect 
to the quality of the model and its complexity. In this case, often not the best 
solution with respect to the original function is used but a solution that is still 
of good quality while having a lower complexity. 

We introduce a population-based genetic programming algorithm for multi- 
objective optimization called SMO-GP that is motivated by the computational 
complexity analysis of an evolutionary multi-objective algorithm called SEMO. 
This algorithm has been considered in several computational complexity studies 
for binary search spaces [101 [13 [23 IM [21H [12] • SMO-GP starts with a single 
solution, produces in each iteration one offspring, and stores the set of differ- 
ent trade-offs with respect to the given objective functions in the population. 
We study the effect of using the mentioned multi-objective approach in genetic 
programming in a rigorous way. To do this, we study the computational com- 
plexity of SMO-GP with respect to the runtime that it requires to achieve the 
so-called Pareto front which is the set of all possible trade-offs of the original 
given function and the complexity measure. 

Throughout this paper, we consider the problems Weighted ORDER (W- 
ORDER) and Weighted MAJORITY (WMAJORITY). These are generaliza- 
tions of ORDER and MAJORITY which have been analyzed in [7]. This gener- 



alization is similar as the generalization of OneMax to the class of linear pseudo- 
Boolean functions in the investigations of evolutionary algorithms working on 
binary strings [6] . The analysis of linear pseudo-Boolean has played a key role in 
the analysis of evolutionary algorithms working on binary string [311 151 [Hj. This 
class of functions has also been examined in the context of ant colony optimiza- 
tion, but determining the exact optimization time of simple AGO algorithms 
for this class of functions is still a challenging open problem [TB] . 

We think that understanding the behavior of simple GP algorithms on W- 
ORDER and WMAJORITY will play a similar role in the computational com- 
plexity analysis of GP. In this paper, we present first steps in understanding 
the behavior of simple GP algorithms for these problems. In many cases, we 
consider GP algorithms carrying out one single mutation operation in each mu- 
tation step. This is comparable to randomized local search for binary strings. 
Our analyses provide important insights for the combination of the original 
function value and the complexity of the tree. We explicitly state that it is very 
interesting and challenging to analyze GP algorithms where a larger number of 
operations is possible in the mutation steps and list such topics for future work 
in the conclusions. 

The outline of the paper is as follows. In Section [2J we introduce the prob- 
lems that we consider in this paper. Section [31 examines the impact of the 
complexity as a secondary measure and presents runtime analyses for (1-fl) GP 
on WORDER and WMAJORITY. In Section |4| we turn to multi-objective op- 
timization and analyze the time until SMO-GP has computed the whole Pareto 
front. We finish with some conclusions and topics for future work. 

2 Preliminaries 

We consider tree-based genetic programming, where a possible solution to a 
given problem is given by a syntax tree. The inner nodes of such a tree are 
labelled by function symbols from a set F and the leaves of the tree are labelled 
by terminals from a set T. 

We examine the problems Weighted ORDER (WORDER) and Weighted 
MAJORITY (WMAJORITY) which are generalizations of ORDER and MA- 
JORITY analyzed in ff]. For both, the only function is the join operation 
(denoted by J). The terminal set T is a set of 2n variables, where Xi is the 
complement of xf. 

• F :— {J}, J has arity 2. 

• 1 . ^X I ^ X I J . . . ■, Xyi^ Xn J 

A valid tree for n = 6 is shown in Figure [T] We attach to each variable 
Xi a weight Wi € R, 1 < i < n, such that the variables can differ in their 
contribution to the overall fitness of a tree. Without loss of generality, we 
assume that wi > W2 > ■ ■ ■ > w„ > holds throughout this paper. This 
assumption allows for an easier presentation, but is no restriction to the general 




Figure 1: Example tree X with C(X) = 19 



Init: I an empty leaf list, S is an empty statement list. 

1. Parse the tree X inorder and insert each leaf at the rear of I as it is visited. 

2. Generate S by parsing / front to rear and adding ("expressing") a leaf to 
S only if it or its complement are not yet in S (i.e. have not yet been 
expressed) . 

3. WORDER (X)= E:..e5^"»- 

Figure 2: Computation of WORDER (X) 



case as our algorithms treat positive and negative variables in the same way, 
and do not give preference to any specific variable. 

For a given syntax tree X, the value of the tree is computed by parsing the 
tree inorder. The weight Wi of a variable Xi contributes to the fitness iff Xi is 
positive and contained in the set S of the evaluation function. For WORDER 
Xi is contained in S iff it is present in the tree and there is no Xi that is visited 
in the inorder parse before Xi. For WMAJORITY, Xi is contained in S iff Xi is 
present in the tree and the number of xi variables in X is at least as high as 
the number of Xi variables in X. For a given tree X their evaluation is shown 
in Figures [2] and [3j ORDER and MAJORITY as special cases where w^ = 1, 
1 < i < n, holds. 

We illustrate both problems by an example. Let n = 6 and wi = 13, W2 — 11, 
^3 — 8, Wi — 7, w^ — 5, Wq — 3. For the tree X show in Figure [T] we get (after 
the inorder parse) 



/ = {xi,X4„X2,Xi,X3,Xe,X4„X3,X5,X3) 



Init: / an empty leaf list, S is an empty statement list. 

1. Parse the tree X inorder and insert each leaf at the rear of / as it is visited. 

2. For i < n: if count(a;i (z I) > count(a;i G I) and cowat{xi (z I) > 1, add Xi 
to S 

3. Return WMAJORITY (X)= Ex.es^«- 



Figure 3: Computation of WMAJORITY (X) 

For WORDER, we get S = (xi,a;4,X2,a;3, XgjXs) and 

WORDER(X) = wi + W2 = 13 + 11 = 24. 

For WMAJORITY, we get 5 = (xi,a;2,X3,a;4) and 

WMAJORITY(X) ^wi + w2+w3 + w4 = l3 + n + 8 + 7 = 39. 

The complexity C of a given tree X is the number of nodes it contains. For the 
tree X given in Figure [l] C{X) = 19 holds. 

There are two problems we will consider. The first one is the single-objective 
problem of one computing a solution X which maximizes F. During the opti- 
mization run, our algorithms are allowed to use the function C as an additional 
criteria if two solutions have the same function value with respect to F. The 
second problem is the computation of the Pareto front for the multi-objective 
problem given by F and C. 

We study genetic programming algorithms which take into account the orig- 
inally given problem as well as the complexity of a given solution. We can 
formulate this as a multi-objective problem which assigns different objective 
values to a given solution. Throughout this paper, we assume that we have one 
objective function F that should be maximized and have the complexity C of a 
GP-syntax tree as the second objective which should be minimized. F can be 
considered as the original problem at hand, and the minimization of C allows 
to cope with the bloat problem. Our algorithms work with the multi-criteria 
fitness function MO-F(X)= (F(X), C(X)). 

Consequently, we obtain the following problems when adding the complexity 
of a solution X as the second criteria. 

• MO- WORDER (X) = (WORDER (X), C(X)) 

• MO- WMAJORITY (X) = (WMAJORITY (X), C(X)) 

For the special case where Wi ~ 1, I < i < n, holds, we obtain the problems 

• MO-ORDER (X) = (ORDER (X), C(X)) 



Mutate Y by applying HVL-Primc k times. For each application, randomly 
choose to either substitute, insert, or delete. 

• If substitute, replace a randomly chosen leaf of Y with a new leaf u G T 
selected uniformly at random. 

• If insert, choose a node v in Y uniformly at random and select u € T 
uniformly at random. Replace v with a join node whose children are u 
and V, with the order of the children chosen randomly. 

• If delete, randomly choose a leaf node v of Y, with parent p and sibling 
u. Replace p with u and delete p and v. 



Figure 4: Mutation operator 



• MO-MAJORITY (X) = (MAJORITY (X), C(X)) 

which add the complexity C as an additional objective to the problems 
ORDER and MAJORITY. We will pay special attention to these problems 
and examine how the use of the additional complexity objective influences the 
runtime behavior as it allows a direct comparison to the results obtained in [7] . 
Note, that an alternative way of modelling our problems is to work directly with 
the weights Wi and Wi, 1 < i < n, as variables in the tree. Such a presentation 
is equivalent to the one we have chosen and would lead to the same results as 
presented in this paper. 

We consider simple mutation-based genetic programming algorithms. They 
use the operator HVL-Prime which has been part of the (1+1) GP variants 
analyzed in [7|. HVL-Prime allows to produce trees of variable length and is 
based on three different operations, namely insert, substitute and delete. Each 
application of HVL-Prime chooses one of these operations randomly. Through- 
out this paper, randomly chosen always means randomly chosen with respect to 
the uniform distribution. The complete description of the mutation operator is 
given in Figure [4] For its application a parameter k determining the number of 
HVL-Prime operations has to be chosen. As in [3, we consider two possibilities. 
In the case of single-operations k = 1 holds. For multi-operations k is chosen 
according to 1 -f Pois{l) where Pois{l) denotes the Poisson distribution with 
parameter A = 1. 



3 (1+1) GP 



In this section, we consider (H-l) GP algorithms working with the multi-criteria 
fitness functions introduced in the previous section. The algorithms are simple 
hill-climbers that explore their neighbourhood in dependence of the mutation 
operator. They differ from the ones analyzed in 171 only in the selection step. 



The outline of (1+1) GP is shown in Algorithm [Tl It starts with an initial 
solution X and produces in each iteration one single offspring Y by mutation. 
Y replaces X if it is favored according the selection mechanism. 

Algorithm 1 ((1+1) GP). 

1. Choose an initial solution X . 

2. Repeat 

• Set Y := X. 

• Apply mutation to Y . 

• // selection favors Y over X then X := Y . 

We will consider the algorithm (1+1) GP-single which applies the mutation 
operator HVL-Prime once in each mutation step, i.e. the mutation operator 
given in Figure |4] is used for k — \. Analyzing the computational complexity of 
this algorithm, we are interested in the expected number of fitness evaluations 
until the algorithm has found an optimal solution for the given problem F for 
the first time. This is called the expected optimization time of the analyzed 
algorithm. 

The worst case results for (1+1) GP-single obtained in [7| depend on the 
maximum size of the tree (denoted by Tmax) that is encountered during the 
optimization process. To be more precise, the upper bound for (1+1) GP-single 
is O(nrmax) for ORDER and (^(n^Tmaxloglogn) for MAJORITY. As r,„ax is 
not known in advance, it is more desirable to have runtime bounds that only 
depend on the input and the size of the initial tree. In such a case, the user 
has complete knowledge on how much worse such a bound can get. Especially, 
in the light of the bloat problem, Tmax can be assumed to be quite large for 
various types of problems. We will analyze our algorithms in dependence of the 
tree size of the initial solution (denoted by Tinn)- 

The key point of our study is to examine how the complexity of a solution as 
the secondary measure influences the runtime. The selection mechanism for the 
(1+1) GP-single variant studied in [7] ((1+1) GP-single on F) and the selection 
in our algorithm ((1+1) GP-single on MO-F) are shown in Figurelsj Note, that 
using (1+1) GP-single on MO-F presents a parsimony approach which is quite 
common in genetic programming to deal with the bloat problem. 

3.1 Analysis 

We start our analysis of (1+1) GP-single by presenting a general lower bound 
on the expected optimization time. This bound holds independently of the 
chosen fitness function and is a direct consequence of the coupon collector's 
theorem '2?. 

Theorem 2. Let X he the empty tree, then the expected time until (1+1) GP- 
single has produced an optimal solution for MO-ORDER and MO-MAJORITY 
is 51(71 log n). 
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Figure 5: Selection for (1+1) GP 



Proof. In order to produce an optimal solution for the given problems, each 
positive variable has to be introduced at least once into the tree. The probability 
to introduce one specific variable Xi in the next step is at most \ ■ -■ Using the 
coupon collector's theorem, the result follows immediately. D 

Theorem^shows that we can not expect a better upper bound then 0{n log n). 
This is a typical bound for many simple evolutionary algorithms as they usually 
encounter the coupon collector effect. In the following, we present upper bounds 
on the runtime of (1+1) GP-single working with the multi-criteria fitness func- 
tions. Theorem [2] implies that the upper bounds presented in the following are 
tight. 

A variable Xi is called expressed if it contributes to the overall fitness of 
our original problem F. This is the case if a variable is positive and contained 
in the statement list S of our evaluation function. We call a solution X non- 
redundant if the number of expressed variables is k and its complexity is 2fc — 1. 
Furthermore, the empty tree is called non-redundant as well. For the problems 
we consider, any tree that does not fall into the non-redundant category can 
be improved with respect to complexity without decreasing its fitness. Solu- 
tions where such improvements with respect to the complexity are possible are 
called redundant. The key idea of our analysis is to show that the algorithm 
quickly eliminates redundant variables. After these redundant variables have 
been removed, the algorithm can introduce missing variables at any position of 
the tree. 

We present upper bounds for (1+1) GP-single on MO-WORDER and MO- 
WMAJORITY which are tight if T^^t = 0{nlogn) holds. 

Theorem 3. The expected optimization of (l-hl) GP-single on MO-WORDER 

is 0{Tinit + nlogn). 

Proof. For our analysis we consider two phases. First we analyze the time until 
the tree has become non-redundant. Afterwards, we bound the time to obtain 
an optimal solution. 

We claim that after an expected number 0(Tinit + nlogn) steps the tree is 
non-redundant. Let k be the number of expressed variables and s be the number 



of leaves in the tree. Then there are s — k variables that can be deleted without 
changing the WORDER-value. Such a step reduces the complexity of the tree 
and is therefore accepted. The probability for such a deletion is at least 

s — k 



3-s 
We show that the value of s — A; can not increase during the run of the 
algorithm. Obviously k can not decrease as selection is primarily based on 
WORDER. The number of leaves s can only increase by 1 if a step is an im- 
provement according to WORDER. In this case, s — k does not change which 
shows that s — k does not increase during the run of the algorithm. Using the 
method of fitness-based partitions (see e.g. Chapter 4 in [5B]) the expected time 
until s = k holds is upper bounded by 
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= 0{n\ogn) + OiT.mt) 

Now, we consider the time to reach an optimal solution and work under the 
assumption that X is non-redundant. Note, that this invariant is maintained as 
we have shown that the difference s — k can not increase during the run of the 
algorithm. Let ti — A; be the number of unexpressed variables after for the first 
time a non-redundant tree has been obtained. Any of these n — k variables can 
be inserted at any position in the tree in order to improve the WORDER-value. 
In total, there are 2n variables to choose from. Hence, the probability to achieve 
an improvement is at least 

1 n — k 

3 ' 2n 
Using again the method of fitness-based partitions, the expected time to 
achieve an optimal tree which consists of the variables Xi, 1 < i < n, is upper 
bounded by 

T.[y7^ =6-E;^3I-0(nlogn). 



Summing up the runtimes for the two phases, the expected optimization 
time of (1+1) GP-single on MO-WORDER is 0(r„„f +nlogn). D 

We now transfer the previous result to the problem WMAJORITY. The 
analysis carried out in [7^ for (1+1) GP-single on MAJORITY has to take into 
account random walk arguments for dealing with plateaus in the search space 
which leads to a runtime bound of O(n^Tinaxloglogn) for MAJORITY. 

Using MO- WMAJORITY we do not face the difficulty of a plateau dur- 
ing the optimization as the (1+1) GP variants considered in |li. The random 
walk is averted as solutions with the same WMAJORITY-value, but a higher 
complexity are not accepted by the algorithm. In fact, the additional search di- 
rection given by the information on the size of the tree leads to a similar fitness 
landscape as for MO-WORDER. This leads to the following result. 

Theorem 4. The expected optimization time of (1+1) GP-single on MO-WMA- 
JORITY is 0{T,nit +n\ogn). 

Proof. The proof of Theorem |3] for MO-WORDER has only used the fact that 
the difference s — k can not increase during the run of the algorithm and that 
later on (in the second phase) each non-expressed variable can be inserted at any 
position in the current tree. Both properties also hold for MO- WMAJORITY 
which implies that we get the same upper bound of OiTinit + nlogn). D 

4 Mult i- Objective Algorithms 

The previous section has shown that using the complexity of the syntax tree as a 
secondary measure can provably lead to better upper bounds on the runtime of 
simple genetic programming algorithms. Depending on the complexity that one 
allows for a given problem, the value of the best solution for the original problem 
F may vary. In the case of multi-objective optimization, we are interested in the 
different trade-offs between the original problem F and the complexity C. In 
this section, we analyze simple multi-objective genetic programming algorithms 
until they have computed the whole Pareto front for a given problem MO-F(X) 
= (F(X), C(X)). 

4.1 Multi-Objective Genetic Programming 

The idea in multi-objective optimization is to treat the given criteria as equally 
important. We consider the following relations on search points which will later 
on be used in the selection step of our algorithms. 

1. A solution Y weakly dominates a solution X (denoted hy Y y X) iff 
(FiY) > F{X) A C{Y) < C{X)). 

2. A solution Y dominates a solution X (denoted by F ;^ X) iff (Y >: 

X) A (F(y) > F{X) V C{Y) < C{X)). 
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3. Two solutions X and Y are called incomparable iff neither X }z Y nor 
Y hX holds. 

A solution is called Pareto optimal iff it is not dominated by any other 
solution in the search space S. The set of Pareto optimal solutions is called 
the Pareto optimal set and the set of corresponding objective vectors is called 
the Pareto front. The classical goal in multi-objective optimization is to com- 
pute for each objective vector of the Pareto front a corresponding Pareto op- 
timal solution. We introduce and analyze an algorithm called Simple Multi- 
Objective Genetic Programming (SMO-GP) which is motivated by the Simple 
Multi-Objective Optimizer (SEMO) algorithm that has frequently been consid- 
ered in the computational complexity analysis of evolutionary multi-objective 
optimization algorithms for binary search spaces [201 IHl US |211 1131 IHl HI- 
SMO-GP starts with a single solution and produces in each iteration one single 
offspring Y by mutating an individual of the current population P. The popu- 
lation consists in each iteration of a set of solutions that are non-dominated by 
any other solution seen so far during the run of the algorithm. In the selection 
step, the offspring Y is added to the population P iff it is not dominated by any 
other solution in P. If Y is added to P all solutions that are weakly dominated 
by Y are removed from P. 

Algorithm 5. SMO-GP 

1. Choose an initial solution X . 

2. SetP ■.= {X). 

3. Repeat 

• Choose X £ P uniformly at random,. 

• Set Y := X. 

• Apply mutation to Y . 

• If{ZeP\Z^Y} = 9, 

set P := (P \ {Z e P I y h Z}) U {Y}. 

We consider the algorithms SMO-GP-single and SMO-GP-multi. Both use 
the mutation operator given in Figure [4] For SMO-GP-single k =1 holds, and 
for SMO-GP-multi the parameter k is chosen according to 1 -f Po«s(l). Our 
goal is to investigate the expected number of iterations until our algorithms 
have computed a population which contains for each Pareto optimal objective 
vector a corresponding solution. We call this the expected optimization time of 
the multi-objective genetic programming algorithms. 

Our multi-objective model trades off the function value against the com- 
plexity value. A special Pareto optimal solution of the multi-objective model is 
the empty tree which has the lowest possible complexity value. The following 
lemma bounds the expected time until the empty tree has been included into 
the population P when considering an arbitrary problem MO-F. We denote by 
Tinit the size of the tree of the initial solution and analyze the time to include 
the empty tree in dependence of Tinn and the number of different fitness values 
of the problem F. 
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Lemma 6. Let Tinu be the size of the initial solution and k be the number of 
different fitness values of a problem F. Then the expected time until the popula- 
tion of SMO-GP-single and SMO-GP-multi applied to MO-F contains the empty 
tree is 0{kT^nit)- 

Proof. As the problem F has at most k different fitness values, the population 
size of the algorithms is bounded by k. At each time step wc consider the 
solution with the lowest complexity in the population. This solution is selected 
for mutation with probability at least 1/fc. A single deletion operation applied 
to this individual leads to a new solution of lower complexity. The probability 
for such a mutation step is at least l/(3efc). Summing up the different values 
for the minimal tree size in the population, we get 

T,„it 

y^ 3efc = iekT^nit = 0{kTinit) 

i=l 

as an upper bound on the expected time until the empty tree is included in the 
population. D 

4.2 ORDER and MAJORITY 

We now examine how SMO-GP-single and SMO-GP-multi can compute the 
Pareto front for the multi-objective problems given by MO-ORDER and MO- 
MAJORITY. In the following, we show that both algorithms compute the whole 
Pareto front for both problems in expected time 0{nTinit + ri^ logn). 

We remark that a lower bound of ^{n^ logn) holds for both algorithms and 
both problems when starting with the empty tree. This bound can be obtained 
by using the coupon collector's theorem in a similar way as in Theorem [2] and 
taking into account the additional factor of n for the population size. 

Theorem 7. The expected optimization time of SMO-GP-single and SMO-GP- 
multi on MO-ORDER is 0{nTinit + ri^ logn). 

Proof. Due to Lemma l6J the empty tree is produced for any MO-F problem 
having k different fitness values after an expected number of 0{kTinit) steps. 
The number of different fitness values for ORDER is n -I- 1 which implies that 
the empty tree is introduced into the population after an expected number of 
0{nTinit) steps. This solution will never be removed from the population as it 
is the unique solution having complexity 0. 

Assuming that the empty tree has been introduced into the population, we 
analyze the time until the algorithm has produced solutions that are Pareto 
optimal and have ORDER- values 1,2, ...,n. Each tree having i leaves has 
exactly i — 1 inner nodes. Hence, a solution that has ORDER-value i has 
complexity at least 2j— 1, l<i<n. A solution with ORDER-value i is Pareto 
optimal iff it has complexity exactly 2z — 1. We assume that the population 
contains all Pareto optimal solutions with ORDER-value j, < j < i. Then 
choosing the Pareto optimal solution X with ORDER (X)= i for mutation 
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and inserting one of the remaining n — i non-expressed variable, produces a 
population which includes for each Pareto optimal solutions with ORDER-value 
i, < j < i + 1, a corresponding solution. Note that this operation produces 
from a solution of complexity 2i — 1 as solution of complexity 2i — 1 + 2 = 
2(i + 1) — 1 as an insertion introduces a new leaf and a new joint node. 

We have to analyze the probability that such a step happens in the next 
iteration. Choosing X for mutation has probability at least l/(n + 1) as the 
population size is upper bounded by n + 1. A mutation step carrying out just 
one single operation happens with probability at least 1/e and an insertion 
operation is chosen with probability 1/3. Finally, n — i variables (among 2n 
variables T) can be inserted to produce the Pareto optimal solution of ORDER- 
valuc i + 1. In total, the probability of producing the Pareto optimal solution 
of ORDER-value i -I- 1 is at least 

1 1 71 — i 

n + 1 3e 2n 

We use the method of fitness-based partitions according to the different 
values of i. This implies that the expected time until all Pareto optimal solutions 
have been produced after the empty tree has been included in the population is 
upper bounded by 



n-l / -, -, -N -1 

1 1 n — 2 



E 



^ n -|- 1 3e 2n 

n-l 

n — I 



n— 1 _, 

6en{n + 1) • \J : = 0{n^ logn) 



1=0 

Taking into account the expected time to produce the empty tree, the ex- 
pected time until the whole Pareto front of MO-ORDER has been computed is 
OinTimt+n"^ log n). D 

For MO-MAJORITY we can adapt the ideas of the previous proof. Again 
the algorithms do not encounter the problem of plateaus in the search space 
which makes the optimization process much easier than for MAJORITY. 

Theorem 8. The expected optimization time of SMO-GP-single and SMO-GP- 
multi on MO-MAJORITY is 0[nT,mt + n^ logn). 

Proof. For MO-MAJORITY we can follow the same arguments. The number 
of different fitness values of MAJORITY is upper bounded by n -t- 1 and this is 
an upper bound on the population size. The empty tree is produced after an 
expected number of 0(nTj„it) steps according to Lemmap] Having a population 
which contains all Pareto optimal solutions with MAJORITY- values j, < 
j < i, a, population which includes for each Pareto optimal solutions with MA- 
JORITY- value i, < j < i + 1 is obtained by inserting one of the non-expressed 
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variables into the Pareto optimal individual X with MAJORITY {X) — i. The 
probability that such a step happens in the next iteration is at least 



1 



n + 1 3e 2n 

and summing up the expected waiting times as done in the proof of Theorem [7] 
completes the proof. D 

4.3 Weighted ORDER and MAJORITY 

In our previous investigations of MO-ORDER and MO-MAJORITY each ex- 
pressed variable contributed an amount of 1 to the overall fitness of a solution. 
In this subsection, we extend our investigations to MO-WORDER and MO- 
WMAJORITY. 

Considering these problems, it is in principle possible to have an exponential 
number of incomparable solution. Assume for example that Wi = 2"^% 1 < i < 
n holds. Then there are 2" different fitness values for WORDER and WMA- 
JORITY. Furthermore, one can construct trees for these solutions such that no 
solution dominates any other solution in this set. 

Note, that such a set of solutions does not constitute the Pareto front and 
that the Pareto front has size n -I- 1. As stated in Section [2] we assume without 
loss of generality that wi > W2 > ■ ■ ■ > Wn > holds in this paper. Then 
the tree containing exactly the variables xi, . . . ,Xi is Pareto optimal and has 
complexity 2i — 1, 1 < i < n. Furthermore, the empty tree is Pareto optimal 
which gives us the whole Pareto front of size n+ 1. 

We consider the special case, where SMO-GP-single starts with a non- 
redundant solution. We show that in this case SMO-GP-single will not accept 
any redundant solution. This is the key idea for the following theorem. 

Theorem 9. Starting with a non-redundant initial solution, the expected opti- 
mization time of SMO-GP-single on MO-WORDER and MO-WMAJORITY is 
0{n^). 

Proof. We first study the population size and show that it is in each iteration 
at most n + I. We claim that the population can only include solutions that 
have no redundant variables. 

The initial solution is non-redundant due to the assumption of the theorem. 
We prove by induction that this property also holds for all solutions that are 
later on accepted by the algorithm. Let X be a non-redundant solution of the 
current population and y be a redundant offspring created by a single operation. 
The only operations that can lead to redundant variables in Y are substitute 
and insert. If substitute introduces a redundant variable, it has to remove a 
non-redundant variable at the same time. This decreases the fitness while the 
complexity stays the same. Hence, such a step is not accepted. If an insertion 
operation introduces a redundant variable then the fitness stays the same and 
the complexity increases. Such steps are also not accepted. Hence, the algorithm 
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will not accept a redundant solution at any point of the optimization run. There 
are at most n variables that can be expressed. Hence, the complexity can only 
take on n + 1 different values, namely 0, 1, 3, 5, • • • 2n — 1. This implies that the 
population size is upper bounded by n + 1. 

We now study how to obtain the different Parcto optimal solutions. We first 
analyze the expected time until the population contains the empty tree which 
is the Pareto optimal solution of lowest complexity. Let X be the solution in 
the population that has currently the lowest complexity. If we delete one of 
the variables we get a new solution Y with C{Y) < C{X). The probability for 
such a step is at least | • ^^ and the expected waiting time to produce such 
a solution Y is 0{n). There are at most n such steps until the empty tree has 
been reached which implies that the empty tree is included in the population 
after an expected number of 0{n'^) steps. 

The empty tree is a Pareto optimal solution as it has complexity 0. A solution 
of complexity 2j — 1 is Pareto optimal if the tree contains for the largest j weights 
exactly one positive variable. 

Let P be a population that contains for all WORDER-values (same argu- 
ments can be used for WMAJORITY-values) 

3 

^Wfc, < j < i < n, 

k=l 

a Pareto optimal solution. In order to obtain a population that contains for all 

values 

i 

^Wfe, 0<j<z + l<n, 

fe=i 
a Pareto optimal solution, the algorithm can choose the Pareto optimal solution 
X of weight 



E^ 



,Wfe 

fe=l 
for mutation and insert the variable x^+i at any position of the tree X. The 
probability of such a step is at least 

-^— ■-■ — = nil/n") 
n + 1 3 2n ^ ' ' 

and the expected waiting time for such a step is therefore O(n^). A population 
containing for each Pareto optimal objective vector one single solution is ob- 
tained after at most n such steps which implies that the expected optimization 
time is upper bounded by Oirt"). D 

5 Conclusions 

With this paper we have contributed to the theoretical understanding of ge- 
netic programming. Such algorithms often encounter the bloat problem which 
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means that syntax trees grow during the optimization process without provid- 
ing additional benefit. One way of deahng with the bloat problem is to take 
the complexity as an additional criterion to measure the quality of a solution. 
We have studied the (1+1) GP on multi-criteria fitness functions for WORDER 
and WMAJORITY. These problems are generalizations of ORDER and MA- 
JORITY analyzed in [7j and we have given better upper bounds than the ones 
presented in [7j. 

Afterwards, we analyzed a multi-objective genetic programming algorithm 
called SMO-GP. This algorithm is inspired by the SEMO algorithm which has 
been considered in several studies on the computational complexity of evolution- 
ary multi-objective optimization. We are optimistic that it can serve for further 
studies on the computational complexity of multi-objective genetic program- 
ming. We have shown that the Pareto fronts of MO-ORDER and MO-MAJ- 
ORITY are computed by SMO-GP within a small amount of time. Furthermore, 
we have extended our investigations to MO- WORDER and MO- WMAJORITY 
which can encounter an exponential number of trade-off objective vectors. How- 
ever, the size of the Pareto front is linear with respect to the problem dimension 
and SMO-GP-single computes this Pareto front in expected polynomial time 
when starting with a non-redundant solution. 

We finish with two interesting topics for future work. 



• 



• 



Determine the expected optimization time of (1+1) GP-multi which chooses 
k according to l+Pois(l) on MO- WORDER and WORDER, MO- WMA- 
JORITY, and WMAJORITY. 

Determine the expected optimization time of SMO-GP-multi on MO- 
WORDER and MO- WMAJORITY. 
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